PCNN: Projection Convolutional Neural Networks

55

∂LP

∂Cl

i

= λ

J



j





W l

j



Cl

i + ηδ ˆ

Cl

i,j



ˆCl

i,j





W l

j,

(3.52)

where 1 is the indicator function [199] widely used to estimate the gradient of the nondif-

ferentiable function. More specifically, the output of the indicator function is 1 only if the

condition is satisfied; otherwise, 0. Updating W l

j: Likewise, the gradient of the projection

parameter δW l

j consists of the following two parts

δW l

j = ∂L

∂W l

j

= ∂LS

∂W l

j

+ ∂LP

∂W l

j

,

(3.53)

W l

j W l

j η2δW l

j ,

(3.54)

where η2 is the learning rate for W l

j. We also have the following.

∂LS

∂W l

j

=

J



h



∂LS



W l

j



h

=

J



h

 I



i

∂LS

ˆCl

i,j

∂P l,j

ΩN (

W l

j, Cl

i)

(

W l

j Cl

i)

(

W l

j Cl

i)



W l

j



h

=

J



h

 I



i

∂LS

ˆCl

i,j

11

W l

jCl

i1Cl

i



h

,

(3.55)

∂LP

∂W l

j

=λ

J



h

 I



i





W l

j



Cl

i +ηδ ˆ

Cl

i,j



ˆCl

i,j





Cl

i +ηδ ˆ

Cl

i,j



h

,

(3.56)

where h indicates the hth plane of the tensor along the channels. It shows that the proposed

algorithm can be trained from end to end, and we summarize the training procedure in

Algorithm 13. In the implementation, we use the mean of W in the forward process but

keep the original W in the backward propagation.

Note that in PCNNs for BNNs, we set U = 2 and a2 =a1. Two binarization processes

are used in PCNNs. The first is the kernel binarization, which is done based on the projec-

tion onto ΩN, whose elements are calculated based on the mean absolute values of all full

precision kernels per layer [199] as

1

I

I



i



Cl

i1



,

(3.57)

where I is the total number of kernels.

3.5.7

Progressive Optimization

Training 1-bit CNNs is a highly non-convex optimization problem, and initialization states

will significantly impact the convergence. Unlike the method in [159] that a real-valued CNN

model with the clip function pre-trained on ImageNet initializes the 1-bit CNNs models,

we propose applying a progressive optimization strategy in training 1-bit CNNs. Although

a real-valued CNN model can achieve high classification accuracy, we doubt the converging

states between real-value and 1-bit CNNs, which may mistakenly guide the converging

process of 1-bit CNNs.